Fast and exact out-of-core and distributed k-means clustering

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lightning fast asynchronous distributed k-means clustering

One of the most fundamental data processing approach is the clustering. This is even true in distributed architectures. Here, we focus on the problem of designing efficient and fast K-Means approaches which work in fully distributed, asynchronous networks without any central control. We assume that the network has a huge number of computational units (even orders of magnitude more than the numb...

متن کامل

Fast k-means algorithm clustering

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of the dataset is large (for example more than 500millions of points). We propose a two stage algorithm to reduce the time cost of distance calculation for huge ...

متن کامل

Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D

The k-Means clustering problem on n points is NP-Hard for any dimension d ≥ 2, however, for the 1D case there exist exact polynomial time algorithms. Previous literature reported an O(kn) time dynamic programming algorithm that uses O(kn) space. We present a new algorithm computing the optimal clustering in only O(kn) time using linear space. For k = Ω(lg n), we improve this even further to n2 ...

متن کامل

Distributed PCA and k-Means Clustering

This paper proposes a distributed PCA algorithm, with the theoretical guarantee that any good approximation solution on the projected data for k-means clustering is also a good approximation on the original data, while the projected dimension required is independent of the original dimension. When combined with the distributed coreset-based clustering approach in [3], this leads to an algorithm...

متن کامل

Distributed k-Means and k-Median Clustering on General Topologies

This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Knowledge and Information Systems

سال: 2005

ISSN: 0219-1377,0219-3116

DOI: 10.1007/s10115-005-0210-0